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ABSTRACT 

The Fermi Large Area Telescope First Source Catalog (IFGL) provided spatial, 
spectral, and temporal properties for a large number of 7-ray sources using a uniform 
analysis method. After correlating with the most-complete catalogs of source types 
known to emit 7 rays, 630 of these sources are "unassociated" (i.e. have no obvious 
counterparts at other wavelengths). Here, we employ two statistical analyses of the 
primary 7-ray characteristics for these unassociated sources in an effort to correlate their 
7-ray properties with the AGN and pulsar populations in IFGL. Based on the correlation 
results, we classify 221 AGN-like and 134 pulsar-like sources in the IFGL unassociated 
sources. The results of these source "classifications" appear to match the expected 
source distributions, especially at high Galactic latitudes. While useful for planning 
future multiwavelength follow-up observations, these analyses use limited inputs, and 
their predictions should not be considered equivalent to "probable source classes" for 
these sources. We discuss multiwavelength results and catalog cross-correlations to date, 
and provide new source associations for 229 Fermi-LAT sources that had no association 
listed in the IFGL catalog. By validating the source classifications against these new 
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associations, we find that the new association matches the predicted source class in 
~80% of the sources. 

35 Subject headings: catalogs - gamma rays: general - methods: statistical - galaxies: 

36 active - pulsars: general 



1. Introduction 



63 



Astrophysical sources of high-energy 7 rays (photon energies above 10 MeV), although inher- 
ently interesting as tracers of energetic processes in the Universe, have long been hard to identify. 



40 Only four of the 25 sources in the second COS-B catalog had identifications (Swanenburg et al 



1981), and over half the sources in the third EGRET catalog had no associations with known 



objects ( Hartman et al.|l999 ). A principal reason for the difficulty of finding counterparts to high- 
energy 7-ray sources has been the large positional errors in their measured locations, a result of 
the limited photon statistics and angular resolution of the 7-ray observations and the bright diffuse 
7-ray emission from the Milky Way. In addition, a number of the COS-B and EGRET sources were 
determined to be spurious by follow-up analysis and observations. 

A major step forward for detection and identification of high-energy 7-ray sources came when 
the Gamma-ray Large Area Space Telescope (GLAST) was launched on 2008 June 11. It began its 
scientific operations two months later, and shortly thereafter, it was renamed the Fermi Gamma-ray 
Space Telescope. Its primary instrument is the Large Area Telescope (LAT; Atwood et al.|200^ ), the 
successor to the Energetic Gamma- Ray Experiment Telescope (EGRET) on the Compton Gamma- 



52 Ray Observatory (Thompson et al. 1993). The LAT offers a major increase in sensitivity over 

53 EGRET, allowing it to study the 100 MeV to ~300 GeV 7-ray sky in unprecedented detail. 

54 The high sensitivity, improved angular resolution and nearly uniform sky coverage of the LAT 

55 make it a powerful tool for detecting and characterizing large numbers of 7-ray sources. The Fermi- 



56 LAT First Source Catalog (IFGL; Abdo et al. 2010a) lists 1451 sources detected during the first 



11 months of operation by the LAT, of which 821 were shown to be associated with at least one 
plausible counterpart. Of these, 698 were extragalactic (mostly Active Galactic Nuclei, or AGNs) 
and 123 were Galactic (mostly pulsars and supernova remnants, but also pulsar wind nebulae and 
high-mass X-ray binaries). After the publication of the IFGL catalog, the association panorama 



61 evolved very quickly with the release of the catalog of AGNs (ILAC; Abdo et al. 2010k) as well 



as a catalog of pulsar wind nebulae (PWNe) and supernova remnants (SNRs) ( [Ackermann et ah 



2011). 



64 Here, as a starting point for our multivariate classification strategy, we consider the entire 

65 original list of 630 IFGL sources that remain unassociated with plausible counterparts at other 

66 wavelengths. A plausible counterpart is a member of a known or likely 7-ray emitting class located 

67 close to the 95% uncertainty radius of a given IFGL source, with an association confidence of 80% or 
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68 higher ( Abdo et al.||2010a ). The 95% uncertainty radii for IFGL source locations are typically 10'. 



80 



82 



69 While greatly improved over the degree-scale uncertainties of previous instruments, these position 

70 measurements are still inadequate to make firm identifications based solely on location. 

71 We have taken a multi-pronged approach toward understanding these unassociated IFGL 

72 sources, using all the available information about the 7-ray sources. Information about locations, 

73 spectra, and variability has been combined with properties of the established 7-ray source classes 

74 and mult i wavelength counterpart searches. 

75 Here we look in depth at the properties of the IFGL unassociated sources, and investigate the 

76 implications of those characteristics. Specifically, this paper addresses five primary questions: 

77 1. What do the 7-ray properties of the unassociated IFGL sources reveal about these sources 

78 (Section [2])? 

79 2. What does our understanding of the 7-ray properties of the associated sources suggest about 
the possible source class for each of the IFGL unassociated sources (Section [3])? 



81 3. What new associations or multiwavelength counterparts have been found beyond those from 



the first LAT catalog (Section |4]) 



? 



83 4. Do the new classifications properly predict sources that have been associated since the release 

84 of the IFGL catalog (Section [5])? 

85 5. What do the new classifications and associations imply about the existence of unknown new 

86 7-ray source classes (Section [6])? 

87 Although the 2FGL catalog (|Abdo et al. 2011bD was being developed in parallel with the present 



88 work, we focus on the IFGL results, where some follow-up results are available for comparison with 

89 the methods of this work. Such follow-up observations for 2FGL have yet to be done. 



2. Gamma-ray properties of unassociated Fermi-LAT sources 



91 In the IFGL catalog ( Abdo et al.|2010a hereafter "IFGL"), source identifications and associa- 

92 tions were assigned through an objective procedure. For a source to be considered identified in the 

93 IFGL catalog, detection of periodic emission (pulsars or X-ray binaries) or variability correlated 

94 with observations at other wavelengths (blazars) was required. Additionally, measurement of an 

95 angular extent consistent with observations at other wavelengths was used to declare identifica- 

96 tions for a few sources associated with SNRs and radio galaxies (Abdo et al. 2009d 2010d|f|h ). 

97 Associations were reported only for sources with positional correlations between LAT sources and 

98 members of plausible source classes (based on Bayesian probabilities of finding a source of a given 

99 type in a LAT error box). This automated procedure was based on a list of 32 catalogs that contain 
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100 potential counterparts of LAT sources based either on prior knowledge about classes of high-energy 

101 7-ray emitters or on theoretical expectations. In addition, it indicated coincident detections at 

102 radio frequencies and TeV energies, as well as positional coincidences with EGRET and AGILE 

103 sources. 

104 In total 821 of the 1451 sources in the IFGL catalog (56%) were associated with a least one 

105 counterpart by the automated procedure, with 779 being associated using the Bayesian method 

106 while 42 are spatially correlated with extended sources based on overlap of the error regions and 

107 source extents. From the simulations in IFGL we expect that ~57 among the 821 sources (7%) are 

108 associated spuriously in IFGL. We found the initial list of unassociated sources by simply extracting 

109 the list of IFGL sources without any association from the IFGL catalog. These sources are spread 
no across the sky, with about 40% located within 10° of the Galactic plane. 

111 Sources without firm identifications that are in regions of enhanced diffuse 7-ray emission along 

112 the Galactic plane or are near local interstellar cloud complexes (like Orion), sources that lie along 

113 the Galactic ridge (300° < I < 60°, \b\ < 1°), and sources that are in regions with source densities 

114 great enough that their position error estimates overlap in the 7-ray data are called c-sources, 

115 as their IFGL designator has a "c" appended to indicate "caution" or "confused region". The 

116 remainder of the unassociated sources did not have a "caution" designator in IFGL, and here are 

117 called "non-c" sources. 

118 The positions, variability and spectral information given in the catalog provide an important 

119 starting point for the characterization of LAT unassociated sources. We can easily compare intrinsic 

120 properties of the IFGL sources such as spectral index, curvature index and fiux in different energy 

121 bands for both associated and unassociated populations, potentially providing insight into the likely 

122 classes of the unassociated sources. 

123 For the IFGL catalog, the limiting flux for detecting a source with photon spectral index 

124 F = 2.2 and Test Statistic of 25 (TS = 2Alog(likelihood) [Mattox et aL]|1996[ ) varied across the sky 



125 by about a factor of five (see Figure 19 of IFGL). This non- uniform flux limit is due to the non- 
126 uniform Galactic diffuse background and non-uniform exposure (mostly arising from the passage 

127 of the Fermi observatory through the South Atlantic Anomaly). 

128 As discussed in IFGL, when the variability and spectral curvature properties of Fermi-LAT 

129 sources are compared against each other, a clear separation is visible between bright sources with 

130 AGN associations and those with pulsar associations. In Figure [T] (top panel), pulsars lie in the 

131 lower right-hand quadrant and AGN lie in the upper half. However, in the lower left-hand quadrant 

132 the two classes mix, making it difficult to distinguish between them. This region of parameter space 

133 is home to much of the unassociated source population (bottom panel) . A closer look at these and 

134 other properties of the known sources gives clues to methods of separating the two major types, 

135 allowing us to classify some of the unassociated sources as likely members of one of these two source 

136 types (Section [3]). 
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137 2.1. Source locations and Flux distributions 

138 The spatial distributions of the major source types (AGN, pulsars, unassociated sources) are 

139 given in Table [TJ It is clear that there is a significant excess of unassociated sources at low Galactic 

140 latitudes {\b\ < 10°) where 63% of the detected sources have no formal counterparts, compared to 

141 only 36% unassociated at |6| > 10°. 

142 Figure [2] shows the spatial distribution of LAT unassociated sources, with the positions of non- 
143 c sources shown as crosses and the c-source positions given by circles. As for the EGRET (3EG) 



144 catalog sources, the distribution is clearly not isotropic (Hartman et al. 1999). One consideration 

145 when interpreting the distribution of unassociated IFGL sources is that a number of the remaining 

146 unassociated sources are in low Galactic latitude regions where catalogs of AGNs have limited or 

147 no coverage, reducing the fraction of AGN associations. If we bin the different source types by 

148 Galactic latitude (Figure [3|, we see a clear absence of AGN associations in the central 10° of the 

149 Galaxy < 5°), while in the same region there is a spike in the number of unassociated sources. 

150 The unassociated sources have an average flux of 3.1 X 10~^ ph cm~2 g-l^^; > i Q^y)^ ^J^ilg 

151 the associated population averages are 5.5 x 10^^ ph cm^^ s^^ for pulsars and 2.7 x 10^^ ph cm~^ 

152 s"i for AGN. 

153 In the Galactic plane a 7-ray source must be brighter than at high latitudes in order to be 

154 detected above the strong Galactic diffuse emission. Figure |4] (left panel) shows the IFGL source 

155 flux distribution versus Galactic latitude for three longitude bands. It is clear that the Galactic 

156 plane {\b\ < 2?5) is dominated by Galactic diffuse emission, raising the flux detection threshold 

157 to > 5 X 10"^ ph cm s . This is reflected in the average flux of the unassociated c-sources 

158 which at 8.2 x 10~^ ph cm^^ s^^ is significantly higher than that for the non-c unassociated source 

159 population (1.7 x 10^^ ph cm^^ s^^ ). Outside the central region of the Galaxy, the flux threshold 

160 is lower than that shown in Figure |4| 

161 As was the case for COS-B and EGRET, it is likely that a subset of the unassociated sources 

162 are spurious, resulting from an imperfect Galactic diffuse model. Such sources probably have very 

163 low significance, poor localization, and a spectral shape that mimics that of the Galactic diffuse 

164 emission itself. The c-sources in the IFGL catalog are candidates to be sources of this type. 

165 As discussed in Section 4.7 of IFGL, the latitude distribution of the Galactic ridge (300° < 

166 / < 60°) unassociated sources shows a sharp narrow peak in the central degree (|6| < 0?5) of the 

167 Galaxy (Figure |4| right). If this feature is not an artifact, and we assume these sources originate 

168 in a Galactic population, then the scale height for this population must be ~50 pc, to keep the 

169 average distance to the sources within the Galaxy. Such a scale height does not correspond to any 

170 known population of 7-ray sources, making it likely that a number of the sources in the Galactic 

171 ridge are spurious. 
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2.2. Spectral properties 

The IFGL catalog provides spectral information that may be useful for distinguishing between 
different source classes. As part of the IFGL analysis all sources were fit with a power-law spectral 
form and the spectral indices were included in the catalog. In addition, the catalog includes a 
"curvature index" , which measures the deviation of the spectrum from the simple power-law form 
for each source. This means the curvature index is more a measure of the quality of the power-law 
spectral fit than of the intrinsic spectral shape. Figure [5] shows the distributions of the spectral index 
(top panel) and curvature index (middle panel) with respect to flux. Neither of these parameters 
appears to discriminate well between the AGN and pulsar populations. In addition, the relationship 
is nearly linear for the curvature index, indicating that this parameter is strongly correlated with 
flux. That is, fainter sources have relatively poorly measured spectra that cannot be measured 
to be significantly different from power laws. This means that faint 7-ray sources provide less 
discriminating information than bright sources. 

The majority of 7-ray AGN are blazars, which are relativistic jet sources with the jets directed 
toward the earth. An important property of blazars is their typical 7-ray spectral index, which 
offers some discrimination power between FSRQs (Flat Spectrum Radio Quasar) and BL Lacs 
( Abdo et "ar]|2010j ) . The spectra of blazars in both of these sub-classes are typically well described 



as broken power laws in the LAT energy range, and the distributions of spectral indices for FSRQs 



and BL Lacs are compatible with Gaussians (Abdo et al. 2009a, 2010k). However, because pulsar 



spectra are not well described by power-laws, the spectral index of a power-law fit is not a good 
discriminator between pulsar and AGN classes. 

As mentioned before, Figure [s] (middle) shows that the curvature index for pulsars is strongly 
correlated with flux. This is primarily because many of the pulsars detected in the IFGL catalog are 
strong 7-ray sources, with brighter pulsars having a more significant spectral curvature than fainter 



jaulsars. Unfortunately, the broken power- law spectral forms of bright blazars (e.g., Ackermann 



et al.|[20T0 ) also have the effect of inducing a correlation between curvature index and flux for LAT 



198 blazars. 

199 There are few sources with significant detections in all of the five spectral bands used to 

200 calculate the curvature index. Only 36 of the 630 unassociated sources are strongly-enough detected 

201 to have flux measurements reported in each band in the IFGL catalog, as this requires a TS > 10 

202 in each energy range. By contrast, 181 of the unassociated sources are detected in only a single 

203 band, with an additional 88 sources having upper limits in all the spectral bands (i.e., are only 

204 detected when data at all energies are combined). 
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2.3. Variability properties 



In the IFGL catalog the variability index for each source was defined as the of the deviations 



of eleven monthly (30-day) source flux measurements from the average source flux (Abdo et al. 



2010a). While this value increases with flux for AGN, it does not do so for the pulsars (Figure [sj 



bottom panel), making variability a much better discriminator between the two major classes. 



210 One property of blazars is that they are frequently significantly variable in 7 rays (Abdo et al. 



2010g). Their fluxes can vary up to a factor of five on time scales of a few hours and by a factor 



of 50 or more over several months. As a consequence, their characteristic variability can serve as 
a primary discriminator. This property has been used to turn some Fermi-hAT AGN associations 



214 into identifications due to their timing properties (Abdo et al. 2010g Vandenbroucke et al. 2010) 



For variability to be a useful indicator, the time scale must be adapted to the source significance. 
Indeed, for a faint source, the variability needs to be tested on longer time scales than for a bright 
source. All sources in the IFGL catalog were processed in the same way, regardless of flux. Thus, 
for the many IFGL sources not bright enough to be significantly detected on monthly time scales 
the IFGL variability index is not a sensitive discriminator of variability. 

Pulsars, on the other hand, are generally steady sources. Where variability has been seen in 
7 rays, it has been attributed to flares in the nebular contribution of a PWN, rather than to the 



222 pulsar itself (Tavani et al. 2011, Abdo et al. 2011c). This flux stability places pulsars in extreme 



223 opposition to AGN in the 7-ray regime. Essentially any significant detection of variability in an 

224 unassociated source is enough to make a pulsar classification extremely unlikely. 

225 In the IFGL catalog, 241 sources were found to be variable at a formal confidence level of 

226 99% (variability index > 23.21). Of these, 2 are HMXBs, 221 are AGN, and 18 are unassociated. 

227 Variability in bright sources is easier to detect as the source flux is typically above the sensitivity 

228 threshold in each monthly bin. For the lower-flux unassociated sources, however, we need a method 

229 to improve the detection of variability. Using the fractional variability (discussed in Section 3.1, is 

230 one such method. 



2.4. Comparison with source modeling 



232 We can also examine what source distributions we might expect given the populations of source 

233 types in IFGL. We do this by first estimating the total number of detected AGN in IFGL, which we 

234 derive from a model population. To quantify the total number of AGN, we model the population 

235 and then apply that model to the IFGL catalog. 

236 We use the Fermi-hAT hogN-LogS distribution (the distribution of the number N of sources 



237 detectable at a given sensitivity S) for AGN (Abdo et al. 2010j), and the IFGL sensitivity map 



238 (Figure 19 of Abdo et al. 2010a) to generate a Galactic map that contains the number of the 

239 expected AGN at each position in the sky. Summing these results over Galactic longitude we 
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240 obtain the AGN latitude profile shown in Figure [6j Integrating the AGN model allows us to 

241 estimate the number of expected AGN in the sky. By subtracting the number of AGN found by 

242 the model from the sources in the IFGL catalog we obtain the number of Galactic sources in and 

243 out of the plane. This gives the Galactic source estimates for the unassociated source list. 

244 Table [2] compares the IFGL source counts with this model for low and high-latitude regions. 

245 It is clear that the group of sources that is most difficult to associate are those of Galactic origin 

246 at low Galactic latitudes. This is likely due to the presence of a population of spurious sources in 

247 that region in the IFGL catalog. 

248 At high Galactic latitudes, pulsars are the second most numerous class of identified 7-ray 

249 sources; most of those are millisecond pulsars (MSPs). From the set of IFGL pulsars and the new 

250 pulsar associations discussed herein, we find that more than a third of the 7-ray pulsars known to 

251 date are MSPs. If we then assume that ~50% of the 271 unassociated sources that are expected 

252 to be Galactic sources are pulsars (based on the fraction of Galactic sources which are pulsars as 

253 given in Table [T]), and one third of those pulsars are MSPs, we find that we expect 45 new MSPs 



254 in the IFGL unassociated source list. Of the 31 new MSPs discovered to date (Section 4.2), 28 are 



255 at high Galactic latitudes, suggesting that an expected number of 45 is not unreasonable. 

256 At low Galactic latitudes the source content is more diverse, with half the sources being 

257 compact objects (pulsars and X-ray binaries) and nearly half being extended sources (SNRs and 

258 PWNe). If the unassociated sources have a similar distribution, then there will be ~100 pulsars 

259 and ~100 SNRs/PWNe. 



3. Classification of unassociated sources 

The spatial, spectral and variability properties discussed in Section [2] provide a framework that 
allows us to try to predict the expected source classes for the sources that remain unassociated. 
This is done by using the properties of the associated sources to define a model that describes the 
distributions and correlations between measured properties of the 7-ray behavior of each source 
class. This model is then compared to the 7-ray properties of each unassociated source. Generating 
the model requires an associated source parent population with enough members to describe the 
behavior well. For this reason, we have focused only on AGN and pulsars as the input source 
populations. 

To create a model, it is necessary to use 7-ray properties that are clearly different between the 
parent populations. For IFGL, the best parameters are the spectral index, curvature index, and 
variability index, as well as hardness ratios between the different spectral bands. In addition, it is 
important that the properties used to generate the model not be related to source significance, as 



this will bias the results. However, as discussed in Section [271} the curvature index appears to be 
dependent on the source flux, and thus is not a good indicator of the parent population. Also, the 
spectral index and hardness ratios are both spectral indicators, so they overlap in functionality. 
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276 Since the hardness ratios provide more information about spectral shape than the spectral index, 

277 they are preferred for this analysis. 

278 To generate valid classifications, we must first define new parameters that allow intrinsic prop- 

279 erties to be compared rather than relative fluxes. With the new parameters in hand, we can 

280 generate classification predictions using multiple methods, and compare these predictions to each 

281 other. However, the results from these techniques assume that the training samples and test sam- 

282 pies have the same distributions of intrinsic properties. This is not true for the IFGL unassociated 

283 sources, as they are more frequently found in the plane of the Galaxy with elevated background 

284 levels and in confused regions. To help compensate for this difference, we will validate the results 

285 against an independent set of classified sources. 



286 3.1. Improving source type discriminators 

287 To mitigate the effect of low fluxes on the determination of the band spectra, it is necessary 

288 to deflne additional comparative parameters that remove the signiflcance dependency. In this case, 

289 the IFGL catalog provides a set of fluxes in flve bands for each source from which we can find 

290 hardness ratios. To get a normalized quantity the hardness ratios are constructed as: 

HRij = (EnergyFluxj — EnergyFluXi) / {Energy Flux j + EnergyFluXi) (1) 

291 This quantity will always be between —1 and 1; —1 for a very soft source ([high]EnergyFluxj = 

292 0) and +1 for a very hard source ([low]EnergyFluxj = 0). Here energy flux in log(E) units (i.e. 

293 uYu) is used instead of photon flux because the definition works well only when the quantities are 

294 of the same order. This is true for the energy fluxes (because the spectra are not too far from a 

295 E"'^ power law) but not for photon fluxes. 

296 It is also possible to define a quantity that discriminates curvature by combining two hardness 

297 ratios, preferably from bands with a high number of detected sources. Here we use (HR23 - HR34), 

298 where bands 2, 3, and 4 are for 0.3—1, 1—3, and 3—10 GeV respectively. This hardness ratio 

299 difference, or curvature, is positive for spectra curved downwards in vY^, (like pulsars), zero for 

300 power laws and negative for spectra curved upwards (with a strong high-energy component). 

301 To remove the source significance dependency for variability, we use the fractional variability 



(as defined in Equation 5 of Abdo et al. 2010a) instead of the variability index. The fractional 
variability is: 

V (Mm - l)Fluxl, N^Fluxl ^ ' 

where Nint is the number of time intervals (11 in IFGL), ai is the statistical uncertainty in Fi, and 
/rci is an estimate of the systematic uncertainty on the flux for each interval. (Here we use 3% as in 
IFGL.) For some IFGL sources the quantity inside the square root is negative. Those sources are 
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307 assigned a fractional variability of 2%. Figure [7] shows fractional variability versus the curvature 

308 (HR23 - HR34) for both the associated (top panel) and unassociated (bottom panel) IFGL sources. 

309 To allow the largest possible sample when performing the classification, we use the actual best 

310 fit values in calculating both these quantities, even when the IFGL analysis reported only 2-sigma 

311 upper limits. For such sources the variable is not well constrained. However, it will contribute to 

312 the distribution used to determine the classifications and may provide a small amount of additional 

313 discriminating power. 



3.2. Classification using Classification Trees 

We have implemented two different data mining techniques to determine likely source classifica- 
tions for the IFGL unassociated sources: Classification Trees (this section) and Logistic Regression 



317 (Section 3.3). Both techniques use identified objects to build up a classification analysis which 

318 provides the probability for an unidentified source to belong to a given class. We applied these 

319 techniques to the sources in IFGL to provide a set of classification probabilities for each unidentified 

320 source. 

321 Classification Trees are a well-established class of algorithms in the general framework of data 



322 mining and machine learning (Breiman et al. 1984). The general principle of machine learning 



is to train an algorithm to predict membership of cases or objects to the classes of a dependent 
variable from their measurements on one or more input variables. The advantage of this class of 
algorithms is the ability to produce a unique predictor parameter that takes into account several 
input quantities simultaneously. Other well-known flavors of machine learning algorithms include 
artificial neural networks, support vector machines and bayesian networks. 

The purpose of analyses via tree-building algorithms is to determine a set of if-then logical 
conditions (called tree-nodes) that permit accurate prediction or classiflcation of cases: the training 
procedure generates a tree in which each decision node contains a test on some input variable's 
value. The trees in this analysis are built through a process known as binary recursive partitioning, 
which is an iterative process of splitting the data into partitions. Initially all the records in the 
training set are assigned to a single class: the algorithm then tries breaking up the data, using 
every possible binary split on every field. The peculiar advantage of Classification Trees for our 
specific application is their fiexibility in handling sparse or uneven distributions. 

The specific flavor of algorithm used in this case was Adaptive Boosting, which reweights the 
importance of input sources at each step of the classification. The training of boosted decision trees 
is a recursive procedure, whereupon the weights of each incorrectly classified example are increased 
at each step, so that the new classifier focuses more on those examples. The output of such a 
procedure is actually a collection of trees, all grown from the same input sample: the selection will 
be done on a majority vote on the result of several decision trees (200 in the present case). In the 
following text, we will always refer to a single Classification Tree for simplicity, even though the 
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343 real classifier is a much more complex object. 

344 The training and application of Classification Trees to this specific analysis has been performed 

345 using TMine, an interactive software tool for processing complex classification analyses developed 

346 within the Fermi-LAT collaboration ( Drlica- Wagner Sz Charles 2011). TMine is based on ROOT 

347 ( Brun &: Rademakers||1997 ) , a very popular data analysis framework for high energy physics exper- 

348 iments. For the processing and parallel evaluation of multivariate classification algorithms, TMine 

349 utilizes the ROOT Toolkit for Multivariate Analysis (TMVA) ([Hocker et al]|2007[). 



370 



3.2.1. Selection of the relevant training variable.? 

The first step of the CT analysis is to select a sample of data to build the predictor variable. 
We decided to focus on the two most abundant classes of objects in the Fermi catalog, AGN and 
pulsars, and to train a Classification Tree to discriminate between them. We trained a predictor 



354 using all the associated AGNs and pulsars in the IFGL catalog (Abdo et al. 2010a). Because 

355 of the spectral similarities with pulsars, potential associations for sources near SNRs have been 

356 compounded with the pulsars sample. The output of this training process is a parameter (the 

357 predictor) which describes the probability for a new source to be an AGN. 

358 Choosing the most appropriate set of variables for training the Classification Tree is a very 

359 delicate step in the analysis. It is extremely important to ensure that the selected variables are not 

360 dependent on the fiux, the location or the significance of the source: this check can be accomplished 

361 by comparing the distribution of the various parameters for associated and unassociated sources. 

362 Physical considerations about the 7-ray properties of each source class also guided us in the choice 



363 of the most effective variables for discriminating AGN from pulsars (Section 3.1). 



364 After exploring most of the parameters in the IFGL catalog, we selected a set of variables 

365 that includes: the curvature (HR23 - HR34), the spectral index and the fractional variability of 

366 each source, plus the Hardness Ratios for the 5 energy bands in the catalog. Table [3] ranks the 

367 relative importance of the different variables at distinguishing AGNs from pulsars: the weight of 

368 each variable was computed by the Classification Tree algorithm during the training process. 



As described in section 3.1, we used for training the actual best fit values for each variable, 
even when the IFGL analysis reported only upper limits. In case of faint sources that are not 

371 detected in one of the energy bands, some of the Hardness Ratios will be very close to -1 or 1: this 

372 is when the ability of Classification Trees in handling sparse distributions is particularly useful. 

373 Similar considerations apply to the fractional variability, which is very close to zero for pulsars, but 

374 varies for AGNs. 

375 We chose not to use the Galactic latitude as input to the CT in order to avoid biasing our 

376 selection against AGN situated along the Galactic plane and pulsars (especially MSPs) situated 

377 at high Galactic latitude. Furthermore, this choice gives us the opportunity to use the latitude 
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378 distributions of the different populations as a cross check of our result. (The new pulsar candi- 

379 dates should be mainly distributed along the Galactic plane, while the AGN candidates should be 

380 isotropically distributed). 

381 While training the Classification Tree, 30% of known AGNs and pulsars, randomly selected 

382 from the input sample, were kept aside for cross-validation of the method. Such cross-validation 

383 was performed by comparing the predictor distributions for the training and testing samples via a 

384 Kolmogorov-Smirnov test. The result of the test provides a 93% probability for the AGN distribu- 

385 tions and a 47% probability for the pulsar distributions, which make the test fully satisfactory. 

386 It must also be noted that the sources associated with a different class than AGN or pulsar 

387 have been excluded from this training procedure, for a total of 24 sources. We cannot treat these 

388 24 sources uniformly as "background" , because of the smallness of their sample and the diversity 

389 of their spectral properties. However, it is possible to estimate the contamination to the candi- 

390 date AGN and pulsar samples deriving from the likely presence of these "other" sources in the 

391 unassociated sample. 

392 3.2.2. Output of the Classification Procedure 

393 The second step of the analysis consists of deriving the predictor variable for unassociated 

394 sources by applying the Classification Tree that was trained in the previous step. The resulting 

395 predictor is a parameter that describes the probability that each of the unassociated sources is 

396 an AGN. The predictor is included in Table |4] which lists all 630 unassociated Fermi-LAT IFGL 

397 sources, and combines results for all the analyses discussed within this paper. 

398 Figure [8] shows the distribution of the predictor for the IFGL associated sources used in the 

399 training of the tree (left panel) and the distribution of the predictor for the unassociated sources 

400 (right panel). The global shapes of the two distributions are clearly different, with an apparent 

401 excess of pulsar-like sources among the unassociated sources when compared to the associated 

402 source distribution. This may be due to the presumably different fractions of AGN and pulsars in 

403 the associated and unassociated samples, or there may be an additional contributing component. 

404 Nevertheless, the distribution of associated sources clearly shows that we can select a set of AGN 

405 and pulsar candidates with high confidence, when choosing the appropriate fiducial regions. 

406 We set two fiducial thresholds: all the sources with a predictor greater than 0.75 are classified 

407 as AGN candidates while all the sources with a predictor smaller than 0.6 are classified as pulsar 

408 candidates. All the sources with an intermediate value of the predictor remain unclassified after 

409 the CT analysis. The choice of these boundaries is optimized for an efficiency of 80% for the two 

410 source classes in order to keep the misclassification fraction under 2% (the misclassified fraction for 

411 a certain efficiency is determined by the width of the predictor distribution). Here, 80% of AGN 

412 associations in IFGL have a predictor greater than 0.75 and 80% of pulsars have a predictor smaller 

413 than 0.6. 
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414 In this case, the extrapolation from the value of the predictor to the probability of class 

415 membership was performed empirically from the combined input sample (which includes both 

416 the training and testing samples). The expected misclassification fraction in each class was also 

417 evaluated with the same method. This analysis was repeated using the training and testing samples 

418 separately and yielded identical results. A more complex study used the area under the Receiver- 

419 Operating- Characteristic (ROC) curve that is obtained by plotting all combinations of true positives 

420 and the proportion of false negatives generated by varying the decision threshold. This study 

421 provided similar extrapolation results, but more optimistic misclassification fractions: we therefore 

422 decided to rely on the more conservative misclassification estimation provided by the combined 

423 input sample. 

424 The predictor distribution for the 24 sources that were not used during the training procedure 

425 can be used to estimate the further contamination from these sources to the AGN and pulsar 

426 candidate distributions: we expect that up to 2% of the newly classified AGN candidates and up 

427 to 4% of the newly classified pulsar candidates will indeed belong to one of the "other" classes 

428 (galaxies, globular clusters, supernova remnants, etc.). 



431 



3.3. Classification using Logistic Regression 

Another approach to assign likely classifications for the IFGL unassociated sources is the 



Logistic Regression (LR) analysis method (Hosmer & Lemeshow 2000). Unlike the CT analysis 



432 LR allows us to quantify the probability of correct classification based on fitting a model form to 

433 the data. 

434 LR is part of a class of generalized linear models and is one of the simplest data mining 

435 techniques. LR forms a multivariate relation between a dependent variable that can only take 

436 values from to 1 and several independent variables. When the dependent variable has only 

437 two possible assignment categories, the simplicity of the LR method can be a benefit over other 

438 discriminant analyses. 

439 In our case, the dependent variable is a binary variable that represents the classification of 

440 given IFGL unassociated source. Quantitatively, the relationship between the classification and its 

441 dependence on several variables can be expressed as: 

" ' (3) 



(1 + e-^) 

442 where P is the probability of the classification, and z can be defined as a linear combination: 

Z = bo + hxi + b2X2 + ... + bnXn (4) 



443 where 60 is the intercept of the model, the hi (i = 0, 1, 2, n) are the slope coefficients of the 

444 LR model and the Xi (i = 0, 1, 2, n) are the independent variables. Therefore, LR evaluates 
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445 the probability of association with a particular class of sources as a function of the independent 

446 variables (e.g. spectral shape or variability). 

447 Much like linear regression, LR finds a "best fitting" equation. However, the principles on 

448 which it is based are rather different. Instead of using a least-squared deviations criterion for 

449 the best fit, it uses a maximum- likelihood method, which maximizes the probability of matching 

450 the associations in the training sample by optimizing the regression coefficients. As a result, the 

451 goodness of fit and overall significance statistics used in LR are different from those used in linear 

452 regression. 



3.3.1. Selection of the training sample and the predictor variables 

As LR is a supervised data mining technique, it must be trained on known objects in order to 
predict the membership of a new object to a given class on the basis of its observables. As with CT 
analysis, we trained the predictor using the pulsar and AGN associated sources in the IFGL catalog 



457 (Abdo et al. 2010a). Like the CT analysis, the output of this training process is the probability 



458 that an unidentified source has characteristics more similar to an AGN than to a pulsar. 

459 To evaluate the best predictor variables for the LR analysis, we used the likelihood ratio test, 

460 comparing the likelihoods of the models not including (null hypothesis) and including (alternative 

461 hypothesis) the predictor variable under examination. We started by using the fractional variability, 

462 the spectral index, the hardness ratios for the 5 energy bands in the catalog and the position on the 

463 sky (i.e. the Galactic latitude and longitude). The value of the likelihood ratio test is the p- value, 

464 and is useful in determining if a predictor variable is significant in distinguishing an AGN from 

465 a pulsar. If the p- value for a given predictor variable is smaller than the significance threshold a 

466 (0.05) then the predictor variable is included in the multivariate LR model. We did not include 

467 the curvature value {HR23 — in this evaluation because the LR analysis does not work well 

468 with predictor variables that are linearly dependent on other predictor values. 

469 We then calculated the significance of each predictor variable to find the resulting LR coeffi- 

470 cients. The list of the LR predictor variables with the relative values of the maximum likelihood 

471 ratios can be found in Table [H 

472 While AGN are isotropically distributed and pulsars are concentrated along the Galactic plane, 

473 we wanted to see whether our multivariate LR model was able to recognize this effect. The results 

474 indicate that Galactic latitude and longitude are not significant at the a = 0.05 (5% significance) 

475 level. Moreover we find that also HR12 is not highly significant in the LR analysis. It is interesting 

476 to note that HR12, in the univariate LR model, is quite significant (p-value=0.02) to distinguish 

477 between AGNs and pulsars but in a multivariate LR analysis it loses its significance. In Table [5] 

478 those predictor variables selected for the LR model are above the line and those we did not select 

479 lie below the line. 
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480 3.3.2. Defining thresholds 

481 Next we derive the predictor variable for IFGL unassociated sources by applying the trained 

482 classification analysis to those sources. Since the LR analysis used AGNs as primary source type, 

483 the output parameter (^4) listed in Table [4] describes the probability that an unassociated source 

484 is an AGN. The probability that an unassociated source is a pulsar is P = 1 — A (because we are 

485 modeling the behavior of AGNs as "opposite" of the behavior of the pulsars based on the predictor 

486 variables) . 

487 In principle, the dependent variable is a binary variable that represents the presence or absence 

488 of a particular class of objects. We could have selected "pulsars" and "non-pulsars" (e.g. all other 

489 IFGL associated sources) to teach the model to recognize the new pulsars, and done similarly for 

490 the AGNs. We did not follow this approach because there are no source populations in IFGL other 

491 than AGNs and pulsars with sufficient numbers to significantly affect the results. By focusing on 

492 "opposing" characteristics, we improve the efficiency of classifying new AGN or pulsar candidates. 

493 As with the CT analysis we defined two threshold values, one to classify an AGN candidate 

494 {Ca) and one to classify a pulsar candidate (Cp). We chose these two thresholds by analyzing the 

495 ROC curves so that 80% of the AGN associations in IFGL would have a predictor value greater 

496 than Ca and 80% of the pulsars would have a predictor value smaller than Cp, and to result in very 

497 low contamination. Using this principle we set Ca to 0.98 and Cp to 0.62. With these thresholds, 

498 only 1% of AGNs are misclassified as pulsars, while 3% of pulsars are classified as AGN. 

499 To estimate how accurately our predictive model performs in practice, we cross-validated using 

500 only the 756 pulsars and AGNs in the IFGL catalog. We held out 75 sources to be the testing 

501 data set, and we used the remaining 681 for training. We repeated this procedure 10 times, using 

502 a different set of 75 test sources in each data set. At the end, this 10-fold cross-validation showed 

503 that the average testing efficiency rates for these threshold values are 75% for pulsars and 80% for 

504 AGNs, and that the average testing error rates (false positives) are very low, 0.05% for pulsars and 

505 0.02% for AGNs. The 5% lower success rate for the pulsars is likely due to low statistics in the test 

506 samples. 

507 If we apply the model to the IFGL unassociated sources we find that 368 are classified as 

508 AGN candidates {P > 0.98), 122 are classified as pulsar candidates {P < 0.62) and 140 remain 

509 unclassified after the LR analysis. The distributions of IFGL associated and unassociated sources 

510 as a function of the probability of being pulsars are shown in the Figure [9| The thresholds for 

511 assigning pulsar candidates and AGN candidates are indicated in the figure. It is important to 

512 note that in order to meet the acceptance threshold of 80% of the known pulsars, we are including 

513 a large range of predictor values with very few pulsars. This may result in over-predicting the 

514 number of pulsars in unassociated sources. 
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528 



530 



3.4. Combining the two classification methods 

The two classification techniques gave somewhat different results. Of the 630 unassociated 
sources in IFGL, both techniques agreed on the appropriate classification for 57.6% of the sources 
(363), while they gave conflicting classifications for 5.4% (34 sources). The remaining 253 sources 
were left unclassified by one or both techniques (see Table [6]). Studies comparing these classification 



520 techniques (Perlich et al. 2003) have indicated that in data sets with good separability between 

521 the discriminating characteristics, the CT analysis should provide a more robust result. However, 

522 it is evident from the right panels of Figures [8] and [9] that the signal to noise for the unassociated 

523 sources does not provide such clear separability. 

524 Since the purpose of this analysis is to provide candidate sources for follow-up multiwavelength 

525 studies, we use the positive results from both techniques to generate our candidate lists. From these 

526 two methods we can synthesize a final set of classifications for each source, where: 

527 • AGN candidates must be classified by at least one method, and the other method must not 



disagree (that is, not classify it 3jS Si pulsar) . 



529 • Pulsar candidates must be classified by at least one method, and the other method must not 
disagree (that is, not classify it as an AGN). 



531 • Unclassified sources are not classified by either method. 

532 • "Conflicting" sources are those that have been assigned opposite classiflcations (one AGN 

533 and one pulsar) by the two different methods 

534 Based on these definitions, there are 396 AGN candidates (269 are classified as AGN by both 

535 methods), 159 pulsar candidates (72 classified as pulsars by both methods), 41 unclassified sources, 

536 and 34 conflicting sources in the IFGL unassociated source list. Figure [To] shows the curvature- 

537 variability distribution of the newly classified AGN and pulsar candidates based on this synthesis of 

538 the two methods. We see that the unassociated sources have been separated into two populations 

539 with some overlap between them. Comparing this distribution to Figure [T] (top panel), we see that 

540 this separation follows the separation seen between the associated AGN and pulsars. 



541 4. Recent association efforts 

542 In order to validate the results of the classification methods, we must obtain an independent 

543 set of associations from those used to train the two methods. For this, we look to association 

544 efforts that have taken place since the release of the IFGL catalog. This section will present the 

545 new associations, while the classification validation is reported in Section [5] 
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The associations listed in IFGL were based on likely 7-ray-producing objects, i.e. those with 
energetic, non-thermal emission. The first LAT catalog of Active Galactic Nuclei (AGN) published 
shortly afterward found high-confidence AGN associations for 671 high Galactic latitude IFGL 
sources, with an additional 155 LAT sources included in the low-latitude and lower confidence 
association lists (ILAC; |Abdo et al. 2010k). The ILAC association method was the same as for 



IFGL, but the acceptance threshold for association was lower than for IFGL. 

For the unassociated sources in this paper, which by definition do not have plausible coun- 
terparts among the candidate catalogs used for comparison with the LAT sources, we must look 
beyond the obvious candidate source classes. Even with the improved source locations provided by 
the Fermi-LAT, however, positional coincidence with a particular object is insufficient to claim an 
association. 

If potential candidates can be found, then additional tests, based on spatial morphology, cor- 
related variability, or physical modeling of multiwavelength properties, offer the opportunity to 
expand the list of associations. X-ray, optical, or radio candidate counterparts all have better lo- 
calizations than the 7-ray sources, so that a candidate in one of these wavelength bands can be 
matched with those in others. Also, most (if not all) of the catalogs and observations used here to 
find new associations are not complete surveys of the sky. Therefore the lack of an association for 
a IFGL source does not mean that the source cannot be associated. In this section we present only 
a preliminary report of the results from the many ongoing efforts to observe these fields in other 
wavebands. 



4.1. Radio searches for AGN 



The first step in searching for (or excluding) AGN counterparts of Fermi-LAT unassociated 
sources is to consult catalogs of radio sources. Almost all radio AGN candidates of possible interest 



are detected either in the NRAO VLA Sky Survey (NVSS; Condon et al. 1998) or the Sydney 
University Molonglo Sky Survey (SUMSS; |Bock et al.||T999| ). NVSS covers the entire 6 > -40° sky 



571 and provides interferometric flux density measurements at 1.4 GHz. SUMSS covers the remainder 

572 of the sky and offers interferometric measurements at 0.843 GHz. 

573 In order to discover AGN counterparts, which are radio sources with compact, flat-spectrum 

574 ipores, we follow the approach of the Cosmic Lens All Sky Survey (CLASS; |Myers et al.|2003" Browne 



575 |pt al.||2003[ ) and the Combined Radio All-Sky Targeted Eight GHz Survey (CRATES; [Healey 



et al. 2007) — both of which have been shown (Abdo et al. 2009a, 2010k) to include substantial 



numbers of radio counterparts of LAT blazars — and pursue 8.4 GHz follow-up interferometry of 
blazar candidates. The Fermi-motivated VLA programs AH976 ( Healey et al.|2009 ) and AH996 (in 
progress) obtained such data for several hundred sources, and ~ 50 of these appear as "affiliations" 
(i.e., candidate counterparts for which quantitative association probabilities could not be computed) 
in the ILAC catalog. 108 new associations, including the "figure of merit" value for each association 
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582 dAbdo et aL]|2010kD , were determined by these VLA follow-up programs. 

583 Recently, serendipitous radio identification surveys of IFGL sources have been independently 



584 iparried out using the recently released Australia Telescope 20 GHz radio source catalog (Murphy 



et al. 2010) which contains entries for 5890 sources observed at 5 < 0°. Cross-correlation between 



586 the IFGL source list and the AT20G catalog has been performed by Mahony et al. (2010) and 



Ghirlanda et al. (2010). In particular, Mahony et al. (2010) find correlated radio sources for 233 



IFGL sources and, based on Monte Carlo tests, they infer that 95% of these matches are genuine 
associations. While most of these radio detections are not identified or classified as specific object 
types, nine of the sources are considered likely to be AGNs (based on their properties at radio 
frequencies). Ghirlanda et al. (2010) obtain a similar number of matches with the AT20GHz 



catalog (~230) and propose eight of the same sources as likely AGN. All but one of these new 
associations were also found by the VLA observing program. 

To date, radio observations of sources in the IFGL unassociated sample have produced 109 
new AGN associations. These new AGN associations are included in Table [7| In addition, the 
ILAC catalog documented 57 other AGN associations with IFGL unassociated sources. These 
have also been added to the table. Where possible, names have been adjusted to be consistent with 
the NASA/IPAC Extragalactic Databas^ nomenclature. 



4.2. Radio searches for pulsars 



Of the 56 7-ray emitting pulsars identified in IFGL, 32 were detected by folding the 7-ray 
data using timing solutions from observations of known radio pulsars. These ephemerides have 
been collected by a global consortium of radio astronomers who dedicate a portion of their time 



603 toward this effort (Smith et al. 2008). The 32 pulsars (23 young and nine MSPs) had all been 



604 discovered in the radio band prior to their detection by the LAT. Since the release of the IFGL 

605 catalog, twelve more of the IFGL sources have been found to have 7-ray pulsations by using 

606 ephemerides of known radio pulsars. While eight of the twelve were associated in IFGL with a 

607 pulsar or a SNR/PWN, four were listed as unassociated sources in the catalog (two young pulsars 

608 and two MSPs). These new associations are listed in Table [7| 

609 In addition to folding data using the properties of known radio pulsars, a promising technique 

610 for identifying unassociated sources is searching for previously unknown radio pulsars that might 

611 be powering the 7-ray emission. This technique was used on many of the EGRET unidentified 



612 sources (Champion et al. 2005 Crawford et al. 2006 Keith et al. 2008, for example) with modest 



613 success, because the error boxes were many times larger than a typical radio telescope beam. With 

614 the LAT, the unassociated source localizations are a much better match to radio telescope beam 

615 widths and generally each can be searched in a single pointing. 



^http:/ /ned. ipac.caltech.edu/ 
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Thus far, over 450 unassociated LAT sources, mostly at high Galactic latitudes, have been 



617 searched by the Fermi Pulsar Search Consortium (PSC; Ransom et al. 2011) at 350, 820, or 1400 

618 MHz. The target lists for these searches were selected from the LAT unassociated sources, with 

619 preference for those that displayed low variability and a spectrum consistent with an exponential 



cutoff in the few GeV range, as seen in the identified 7-ray pulsar population (Abdo et al. 20101) 



This program has resulted in the discovery of 32 previously unknown radio pulsars (one young 
pulsar and 31 MSPs) ( |Ray fc Parkinson|201lHKeith et all2011[ [Cognard et al.||2011t [Bangale et al. 
2011) that are included in Table [Tj Of these 32 new pulsars, 14 also show pulsations in 7 rays. 



624 There is no obvious correlation between the 7-ray and radio fluxes of these pulsars, so additional 

625 discoveries can be expected as fainter unassociated IFGL sources are searched. 

626 Searches by the PSC continue. In the Galactic plane, high dispersion measures and sky tem- 

627 peratures demand higher frequency observations with concomitantly smaller beam sizes. Young, 

628 energetic pulsars can be very faint in the radio (Camilo et al. 2002b|a >. Nevertheless, we expect 

629 that deep observations will continue to turn up more discoveries of radio pulsars in unassociated 

630 IFGL sources in the near future. 

631 To summarize, radio observations of sources in the IFGL unassociated sample have produced 36 

632 new pulsar associations. 18 of these pulsars are considered firm identifications due to the detection 

633 of pulsations in the LAT data. 



634 4.3. X-ray observations of unassociated source fields 

635 To look for additional possible counterparts we cross-correlated the list of unassociated IFGL 

636 sources with existing X-ray source catalogs. We stress that the resulting compilation has no claim 

637 of completeness since the match with cataloged X-ray sources depends on the serendipitous sky 

638 coverage provided by the X-ray observations, and the integration time of the observation. While 

639 it is possible that candidate X-ray counterparts to the LAT unassociated sources may be singled 

640 out on the basis of, e.g., their brightness and/or spectral properties, most will be recognized only 

641 through a coordinated multiwavelength identification approach (which is beyond the scope of this 

642 paper). 

643 To begin, we considered the 2XMM source catalog derived from pointed XMM-Newton ob- 



644 servations (Watson et al. 2008). The fourth incremental release of the catalog (2XMMi) includes 

645 191,870 unique X-ray sources extracted from all XMM-Newton observations that were public as of 

646 2008 May 1, i.e. performed through the end of 2007 April. We cross-correlated the LAT source lists 

647 with the 2XMMi catalog, using a cross-correlation radius equal to the semi-major axis of the 95% 

648 confidence ellipse of the LAT source, and found that 40 of the IFGL unassociated fields contained 

649 2XMMi detections. Of these 40, four had been found to be associated with AGN by the radio 



650 follow-up observations (Section 4.1) 



By looking at the XMM-Newton observation log (up to 2011 February 27) we can estimate the 
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684 
685 
686 
687 



potential increase in the number of matches that could occur if we were to use the longer obser- 
vational database. For the cross-correlation we used the LAT 95% semi-major axis and compared 
against a radius equal to the sum in quadrature of the EPIC camera radius (15') and the r95 
positional uncertainty for the X-ray source. Here we found another 17 fields of unassociated LAT 
sources that have been observed by XMM-Newton, either serendipitously or pointed. 

In addition, we cross-correlated with the ROSAT All Sky Survey Bright and Faint Source 



658 Catalogs (Voges et al. 1999, 2000a) and with the ROSAT catalog of pointed Position-Sensitive 



Proportional Counter (PSPC) ( Voges et al.||1994 ) and High Resolution Imager (HRI) (Voges et al 



2000b) observations. These found 101 unassociated source fields with ROSAT counterparts, 15 of 



which were also found in the 2XMMi correlation. These results are summarized in Table [7| and 
show that a preliminary X-ray screening provides potential X-ray counterparts for about 20% of 
all the Fermi-LAT unassociated sources. These possible X-ray counterparts are obviously prime 
targets for multiwavelength follow-up observations. 

We also compared the IFGL unassociated source list with recently published catalogs of hard 



666 X-ray/soft 7-ray sources. These are the Palermo Swift-BAT Hard X-ray Catalog (Cusumano et al 



2010 ) which is a compilation of 754 sources detected by the BAT instrument in its first 39 months 



of operations, and the 4th IBIS/SGRI Soft 7-ray Survey Catalog ( Bird et al.|2010 ) which includes a 
total of 723 sources. Both catalogs contain flux and spectral information and provide likely source 
identification/classification. We cross-correlated the Swift-JiAT and IBIS/SGRI source catalogs 
using the nominal IFGL position and 95% semi-major axis as before, and found 11 new associations 
and 8 new X-ray detections (cases where the candidate X-ray counterpart is not a known 7-ray 
emitter). The IFGL 95% positional error is larger than the positional errors on sources in both 
catalogs. These results are also included in Table [7j where we give the Swift-JiAT and Integral- 
IBIS/SGRI source identifications and proposed catalog classifications. 

We note that a preliminary cross-correlation of the LAT unassociated sources with the ROSAT 



677 sources has been performed by Stephen et al. (2010). However, they used only the ROSAT Bright 



Source Catalog as a reference and found, on statistical grounds, that 60 of the 77 correlated po- 
sitions should be genuine associations. However, they provide likely associations for only 30 of 
the correlations, those with counterparts within 160". Table [t] lists 26 of these correlated sources. 
Three are not listed because the counterpart source type is not a known 7-ray emitter. The final 
source is not listed because no counterpart name was provided. 



A survey of 21 fields of unassociated IFGL sources was carried out by Mirabal (2009) using 



data from the Swift science archive. This investigation indicated X-ray detections for seven LAT 
unassociated sources based on positional correlation with Swift-XRT sources and the likelihood the 
source is a member of a 7-ray emitting class. Three of these are unique to this investigation and 



have been included as X-ray detections. In addition, Mirabal et al. (2010) proposed nine possible 



associations for unassociated LAT sources at |6| > 25° in the 3000 square degree "overlap region", 
a region covered by various radio surveys and by the Sloan Digital Sky Survey. Associations and 



- 24 - 



690 detections from both these investigations are included in Table [7} 

691 The unassociated source IFGL J1958. 9+3459 appears to be nearly coincident with the HMXB 

692 Cygnus X-1, which was recently reported as an AGILE source (Sabatini et al. 2010). While this 

693 source meets the criteria to claim a positional association, there is no clear evidence that the source 

694 detected by the LAT is Cyg X-1. In addition, the source IFGL J1045.2— 5942 is positionally coin- 

695 cident with the luminous blue variable (LBV) star, Eta Carinae {r] Car). While X-ray observations 

696 of r] Car show a 5.54 year periodicity, the 7-ray flux remained constant during the most recent 

697 X-ray minimum in 2008 December - 2009 January. However, due to its unusual 7-ray spectrum 

698 this IFGL source is still believed to be associated with r] Car (Abdo et al. 2010b). 

699 To date, X-ray observations have led to positional associations with ten AGNs, seven HMXBs, 

700 one SNR and the LBV Eta Carinae. An additional 110 sources have X-ray detections that are 

701 excellent targets for follow-up multiwavelength observations. These associations can be found in 

702 Table [3 



4.4. TeV observations of unassociated sources 



Fermi-hAT spectra have been shown to be good predictors of TeV emission, with 55 IFGL 
sources having very high energy (VHE) counterparts ( Abdo et al.||2009e 20101). The energy range 
from ~50 GeV to ~300 GeV is the only range where the LAT data directly overlap with other 
instruments. 

The LAT team has provided recommendations for follow-up observations of a number of hard- 
spectrum sources - including unassociated hard-spectrum sources - that may have VHE coun- 
terparts. Coordinated follow-up observations in the TeV regime have been useful in identifying 



LAT-detected AGNs (see e.g. |Abdo et al]|2009et [Mariotti k MAGIC Conaboration||2010p . In ad- 
dition, LAT sources have been identified as SNRs by comparing the extension in the LAT data to 



713 the VHE emission by using the same procedure as was used for W51C (Abdo et al. 2009d), W44 



(Abdo et al. 2010f) and IC443 (Abdo et al. 2010h). This search has yielded two more identified 



SNRs, W28 dAbdo et al.l|2010cD and W49B ( |Abdo et al.l|2010eD . These associations can be found 
in Table 

We cross-checked the IFGL unassociated source list with the list of TeV sources from TeVCatE] 
and current publications. We consider a source to be coincident with a LAT source if its extension 
overlaps with the 95% confidence ellipse of the LAT source. We find nine TeV sources that are 
coincident with IFGL unassociated sources (Table [s]). Note that the IFGL association process did 
not assign an association to a coincident TeV source if that TeV source had no identification in 
another waveband. Pismis 22 and W43 are possible (but not confirmed) associations with the TeV 



^http:/ /tevcat. uchicago.edu 
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723 source. 



One source of note is IFGL J1702.4— 4147 which hes on the emission "tail" of the elongated 
VHE source, H.E.S.S. J1702-402. The nearby pulsar PSR J1702-4128 lies at the edge of the 
TeV 7-ray emission and would provide enough spin-down energy loss to produce the observed VHE 
emission via an extremely asymmetric PWN ( Aharonian et al.|2008 ). Hence the pulsar is considered 
to be a likely counterpart to the LAT source. To date there has not been a high significance 
detection of pulsations in the LAT data. An additional interesting source, IFGL J1839. 1—0543 
is positionally coincident with HESS J1841— 055, one of the most extended (1 deg in diameter) 
H.E.S.S. unidentified sources. Because of the high density of potential counterpart sources in this 
low-latitude region, there are multiple possible associations for the VHE source (2 SNRs, 3 high 



733 spin-down PSRs, 1 XRB) (Aharonian et al. 2008). Given its high TeV 7-ray flux, it is considered 



a good candidate for LAT detection (Tibolla 2009) 



4.5. Association of LAT sources using only LAT data 



A small number of sources have been associated or identified since the release of the IFGL 
catalog by using LAT data alone. Of the 56 pulsars listed in IFGL, 24 were discovered using blind 



frequency searches (Abdo et al. 2009b) for 7-ray pulsations from the bright unassociated sources. 



These are typically young pulsars, for which the solid angle of the radio beam is likely to be much 



smaller than the 7-ray one (Abdo et al. 20101). As a result of this geometry, many unassociated 



sources are likely to be young, radio-quiet pulsars that will never be found in radio searches. Since 
the release of the IFGL catalog, another blind search pulsar, PSR J0734— 1559 has been discovered 
in an unassociated LAT source. This association has been included in Table [7l 



One new HMXB has also been detected in the LAT data ( [Corbet et al.|201lD , though in IFGL 
the source was associated with the SNR G284.3— 01.8. This is the first of its type to be discovered 
in 7 rays. These new associations are also included in Table [7| 



5. Discussion 

The follow-up multiwavelength associations efforts discussed in Section [4] have resulted in 177 
new extragalactic source associations (all AGN), and 52 new Galactic associations (one source has 
both a Galactic and extragalactic association). When we compare these new associations with 
the expected IFGL source distributions discussed in Section 2.4, the estimated numbers of sources 
that have not yet been associated are reduced to 182 for extragalactic sources and 219 for Galactic 
sources. 



754 To test the two classification algorithms and to estimate the efficiency for identifying the 

755 different source classes, we compare the results to the new source associations in Section |4j first 
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756 looking at individual methods, and then the combined results. In addition, we consider how the 

757 results match the LogA^Log^* analysis. 

758 5.1. Validating the classification results from the separate methods 

759 We separately compared the results of the Classification Tree analysis and the Logistic Regres- 

760 sion analysis to the new source associations in Section [4| Altogether, the new source associations 

761 include 177 new AGN associations as well as 37 new pulsar associations that are divided into 2 

762 categories: 20 objects for which pulsations have been detected in 7 rays (which we will refer to as 

763 "new pulsar detections") and 17 objects for which pulsations have been detected only in the radio 

764 (which we will refer to as "new pulsar candidates"). We will not use new associations from other 

765 source types (HMXB, PWN, SNRs) for this vahdation. 

766 For AGN, we find that 126 sources are correctly classified as AGN candidates by the CT analysis 

767 (efficiency: 71%), 11 were classified as pulsar candidates (false negative: 6%), while the remaining 

768 40 sources were considered still unclassified (23%). The same comparison for the LR analysis gave 

769 142 sources correctly classified as AGN candidates (efficiency: 80%), 7 sources classified incorrectly 

770 as pulsar candidates (false negative: 4%), while the other 28 sources remained unclassified (16% of 

771 the sample). 

772 For pulsars, we noticed a very different performance between new pulsar detections and new 

773 pulsar candidates. For the 20 sources detected as pulsars by the LAT, the CT analysis correctly 

774 classifies 14 pulsars (efficiency: 70%), mis-classifies one source (5%), and leaves the remaining 

775 sources unclassified (25%). The LR analysis correctly classifies 11 pulsars (efficiency: 55%), mis- 

776 classifies one source (5%), and leaves the remaining sources unclassified (40%). On the other hand, 

777 for the new radio pulsar candidates, the CT analysis correctly classifies only 3 objects as pulsars 

778 (efficiency: 18%), mis-classifies 8 objects as AGN (false negative: 47%) and leave the 6 remaining 

779 objects as still unassociated (35% of the new pulsar candidates). The LR analysis correctly classifies 

780 4 pulsar candidates (efficiency: 23.5%), mis-classifies four sources (23.5%), and leaves the remaining 

781 nine sources unclassified (53%). 

782 These results are interesting, as the definition of the pulsar fiducial threshold in the LR analysis 

783 appeared that it might over-estimate the pulsar candidates. However, the Logistic Regression 

784 actually has a somewhat poorer success rate for finding new pulsars and pulsar candidates than the 

785 Classification Tree analysis. Looking more closely at the IFGL properties of the misclassified pulsar 

786 candidates, we found that twelve of the 17 new pulsar candidates have only upper limits for the 

787 300 MeV — 1 GeV band. In contrast, 80% of the new pulsar detections were significantly detected 

788 in this portion of the LAT spectrum. This difference in characteristics for the two pulsar groups 

789 may indicate the need for additional criteria when selecting sources for follow-up observations. 
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5.2. Validating the results from the combined classifications 



We can also compare the new associations to the combined classifications defined in Section 3.4 
Of the 214 newly-associated AGN and pulsars from Section |4| 171 sources (151 new AGN, 16 new 
pulsar detections, and 4 new pulsar candidates) match the classification given by the combined 
analysis, and 26 sources (15 new AGN, 1 new pulsar detection, and 10 new pulsar candidates) 
are in direct conflict with the classification source type. This gives an efficiency of 85% for AGN 
classification and 80% for classification of new pulsar detections, but only 59% for new pulsar 
candidates. 17 of the newly associated sources are unclassified by either method, and only one 
source has conflicting source classification. The one confiicting source turns out to be a new pulsar 
candidate that also has an AGN association, suggesting the LAT source could be the sum of these 
two objects. The overall efficiency for this combined sample is ~80%, comparable to the value we 
were seeking when we set the fiducial values for the two methods. The combined sample has a false 
negative rate of ~12%. 

The spatial distributions of the newly classified sources give us the opportunity to cross check 



our results. Figure 11 shows the spatial distribution of all the sources. Notice that both the AGN 
and pulsar distributions are as expected, even though we have not used the Galactic latitude as 
an input to either classification method. The pulsar candidates are mainly distributed along the 
Galactic plane, with a few high-latitude exceptions that suggest additional nearby MSPs, while the 
AGN candidates are nearly isotropically distributed on the sky. 

From this we can conclude that using only the 7-ray properties of the Fermi LAT sources, 
and the firm associations of the IFGL, we were able to develop a prediction for AGN and pulsars 
classification that nearly matches our expectations (i.e. pulsar candidates are not variable, have a 
curved spectrum and are mainly distributed along the Galactic plane, while AGN candidates are 
mostly extragalactic, variable sources). In all, the efficiency of the combined classification methods 
at classifying new AGNs is high, with a low rate of false negatives, while the efficiency for new 
pulsar candidates is much lower than expected. 

AGN and pulsars are not the only 7-ray source classes known or expected, but the less-populous 
source types are hard to classify using the techniques described here because the training samples 
are too small in IFGL. With time, those training samples will likely grow, and we may be able to 
extend this analysis to additional classifications. 



5.3. Comparison to the LogA'-Log5 predictions 

In addition, we can check to see how the classification results compare to the predictions made 
by the LogA'-LogS' analysis in Section 2.4 For this comparison, we consider a pulsar classification 
to be indicative of a Galactic source, and an AGN classification to be indicative of an extragalactic 
source. 
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B25 Since 229 of the unassociated sources now have associations, we wih consider only those 401 

826 that remain unassociated. Thirty-three of these are unclassified by either technique, and 13 have 

B27 conflicting classifications. The number sources for low latitudes and high latitudes for the remaining 

B28 355 are shown in Table [2| At high latitudes, the observed numbers of both Galactic and extragalac- 

829 tic sources are consistent with the numbers expected from the LogA^LogS* analysis. In contrast, 

830 at low latitudes, the number of Galactic sources is about one-third lower than expected, and the 

831 number of extragalactic sources is higher than expected. It is clear that the group of sources that 

832 is hardest to associate is those of Galactic origin at low latitudes, likely due to the presence of a 

833 population of spurious sources in that region in the IFGL catalog. 



834 Figure 12 (left panel) shows the latitude distribution of the classified unassociated sources 

835 based on classification type. If we combine the AGN candidate population with the IFGL sources 

836 that already have AGN associations (Figure 12, right panel), we find that the shape of the AGN 

837 distribution matches reasonably with that predicted by the model, though there is still an excess 

838 at low Galactic latitudes. 



5.4. Unassociated sources at low Galactic latitudes 

It is the unassociated sources in the central 20° of Galactic latitude < 10°) that may hold 
clues to the content of the narrow Galactic ridge population at \b\ < 0?5, |/| < 60° discussed in 
Section [2?T| To investigate these sources, we separate the pulsar candidates from the other types 



of sources (Figure 13) and consider the distribution. Where the full set of unassociated sources 
appears to indicate an unreasonably narrow scale height of ~50 pc for the population, the latitude 
distribution of pulsar candidates is somewhat closer to expectations, implying a scale height of ~85 



pc. This value is one-third the scale height of LAT-detected 7-ray pulsars ( Abdo et al.||2010l ) 



For 7-ray sources, this population scale height suggests instead that Population I objects such 
as SNRs, with a scale height of ~100 pc ( Lamb Macomb|1997 ), are likely significant contributors 



849 to the IFGL sources. In the IFGL catalog, 44 sources were associated with or identified as SNRs, 

850 and this paper associates six more (Table [7]), giving a total of 50 SNR associations. Considering 

851 that only 63 IFGL sources were associated/identified with pulsars, it is clear that both source types 

852 are significant contributors. The classification method used here does not consistently label SNRs 

853 as pulsar candidates. Of the six new SNR associations, five are classified as pulsar candidates, and 

854 the sixth is classified as an AGN. In the future it may be useful to consider the SNR source class 

855 separately as an input to such classification analyses. 

856 This central portion of the Galaxy is also the region that has most of the sources that are 

857 either unclassified or have conflicting classifications. Of the 257 unassociated sources in this region, 

858 22 have no classification, and 29 have conflicting classifications between the two methods. Of these 

859 51 unclassified or conflicting-classification sources, eight have new associations. The source types 

860 are varied; two MSPs, one young pulsar, three HMXBs and two AGN. It is clear that not all of 
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861 these sources can be spurious, however it is unhkely the remaining 43 are all real detections. 



862 5.5. Informing future follow-up observations 

863 The results of the classification analyses demonstrate that source properties measured with 

864 the Fermi-LAT can provide important guidance on what types of follow-up observations are likely 

865 to be fruitful for many of these unassociated sources. The emphasis in follow-up observations of 

866 LAT sources has been on radio imaging and timing observations for a large number of sources, 

867 as well as targeted X-ray observations for sources of interest (e.g. flaring sources or new radio 

868 pulsar candidates). In addition, there is an on-going program to observe all the bright, well- 

869 localized Fermi-LAT unassociated sources with S'ot/Q that may add important new insights into 

870 these sources as a group. 

871 In Table |4j the last 4 columns show what follow-up observations are recommended in several 

872 wavebands. Obviously, sources classified as likely blazars would benefit from radio searches for 

873 flat-spectrum sources within the LAT error ellipse. Low-frequency radio timing is recommended 

874 for the likely pulsars. X-ray observations of likely pulsars can give timing observers seed locations 



875 at which to search for pulsations in both radio and LAT data (Caraveo 2009). Still-unassociated 

876 sources that may benefit from such observations have been flagged in the IFGL unassociated source 

877 list with the appropriate observation type. These are suggestions; it is highly likely that some of 

878 the sources are misclassified. Also, a number of the follow-up observations discussed here have 

879 yielded no new associations for some of the observed unassociated sources. 

880 We strongly recommend additional joint analyses of LAT and ground-based VHE 7-ray data 

881 for very low-latitude {\b\ < 0?5) Fermi sources. Together, these may give insight into whether or 

882 not a population of SNRs can account for a significant number of the IFGL unassociated sources 

883 along the Galactic ridge. Sources for which this type of analysis is recommended are indicated in 

884 the IFGL unassociated source list (Table [4]). 



5.6. Remaining BSL unassociated sources 

Follow-up observations like those discussed in the previous section have made a significant 
impact, increasing the associated fraction for the IFGL catalog from ~56.5% to ~71% in only a 
little more than one year. But we can look farther back to the Fermi-LAT Bright Source List (BSL 
Abdo et al.||2009f| ), the hst of 205 high significance (> 10a) LAT sources detected m the first three 



890 months of the Fermi mission. Of the 37 sources listed as unassociated in the BSL, ten now have 

891 pulsar identifications or associations, while eight have new AGN associations. In addition, seven 



^ http : / /www. swift .psu. edu/unassociated / 
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B92 have been associated with other 7-ray source types such as SNRs or HMXBs. These associations 

B93 bring the BSL association rate up to 94%. 

B94 Of the remaining twelve BSL unassociated sources, five he in the Galactic ridge and should be 

895 considered with caution. Here we look more closely at these 12 sources: 



B96 
B97 



900 
901 
902 

903 
904 
905 
906 

907 
908 
909 
910 

911 
912 
913 
914 

915 
916 
917 

918 
919 
920 
921 
922 
923 

924 
925 
926 



IFGL J0910.4-5055 (OFGL J0910. 2-5044) - This Galactic plane transient (/, b = 271?7, -1?96) 
flared once in October 2008, an event lasting 1-2 days with peak flux {E > 100 MeV) 
of lxl0~^ ph cm~^ s~^( Cheung et al. 2008). (IFGL reported a peak flux for this source 
of 1.97x10"^ ph cm~^ s~^, but that was averaged over one month.) Recent flgure-of-merit 
analysis by Murphy et al. (2010) has given this source a > 80% probability of being associated 



with the radio source AT20G J091058-504807. Our analysis does not find such an association. 



though Mirabal (2009) does show an association with the likely blazar Swift J0910.9— 5048. 



IFGL J1311. 7-3429 (OFGL J1311. 9-3419) - Besides being associated with 3EG J1314-3431, 
this very bright high-Galactic latitude {l,b = 307?7, 28?2) source is not variable and has a 
spectrum with a high-energy cutoff very similar to a pulsar. To date, searches for both 7-ray 



and radio pulsations from this source have been unsuccessful (Ransom et al. 2011). 



IFGL J1536. 5-4949 (OFGL J1536. 7-4947) - With no significant variability, this persis- 
tently bright mid-latitude {l,b = 328?2,4?8) source is the only unassociated BSL source to 
have confiicting classifications. This source has a moderately curved spectrum with a high- 
energy cutoff. 

IFGL J1620.8-4928C (OFGL J1622.4-4945) - This moderately bright source in the Galac- 
tic ridge (/, b = 333?9, 0?4) is spatially coincident with the AGILE detection lAGL J1624-4946, 
and has a spectrum with a sharp spectral break at ~ 3 GeV. There is no evidence at this 
time for pulsations from this source in either 7 rays or radio ( Ransom et al.pOll ). 



IFGL J1653. 6-0158 (OFGL J1653.4-0200) - Another non-varying source with a pulsar-like 
spectrum, this high-latitude {l,b = 16?6,24?9) source is associated with 3EG J1652— 0223. 
Both classification methods call this source a pulsar candidate. 

IFGL J1740.3-3053C (OFGL J1741. 4-3046) - The position for this non-varying c-source 
in the Galactic ridge (/,6 = 357?7, — 0?1) moved far enough between the two publications 
that the two detections are not formally associated. However, we recall that for pulsars in 
the plane, the IFGL error ellipse appears to underreport the systematic error. As the BSL 
source lies just outside the IFGL 95% confidence contour, we consider the two detections to 
be related. 

IFGL J1839.1-0543C (OFGL J1839. 0-0549) - This bright source in the Galactic ridge 
(/, b = 26?4, 0?1) has a highly curved spectrum and does not vary. Both classification methods 
call this source a pulsar candidate. 
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927 • IFGL J1842.9-0359C (OFGL J1844.1-0335) - This Galactic ridge source (/, b = 28?4, 0?1) 

928 had a source significance of Wa in the first three months, but after eleven months that 

929 value has increased only shghtly, to 10.9(7 (unless variable, a source should have twice the 

930 significance after nearly four times the livetime). Since IFGL found this source to be non- 
931 varying, it is unlikely that the low flux in IFGL is due to variability effects. Instead, it appears 

932 that the longer data set has separated the BSL source into multiple components, leaving the 

933 coincident IFGL source at a lower-than-expected significance. 

934 • IFGL J1848.1-0145C (OFGL J1848.6-0138) - Another source in the Galactic ridge (/, b = 

935 31?0, — 0?1), its spectrum is consistent with a power-law, and the source may be related to a 

936 TeV source (see Table [s]) . 

937 • IFGL J2027.6-I-3335 (OFGL J2027.5+3334) - A bright, mildly variable source that lies 

938 near the Galactic plane (/, b = 73?3, — 2?9), this source is associated with the EGRET source 

939 3EG J2027+3429. In the IFGL catalog, this source flux peaked at 3.4 x lO"'^ ph cm"^ s"^ with 

940 a signiflcance > lOcr in a single month. The spectrum has a large curvature index, indicating 

941 that it is not a simple power-law. Although the variability seems to indicate a possible AGN, 

942 both classification methods consider it a likely pulsar. 

943 • IFGL J2111. 3-1-4607 (OFGL J2110. 8+4608) - While this source near the Galactic plane 

944 {l,b = 88?3,— 1?4) is highly significant, it is fiagged in the IFGL catalog as having the flux 

945 measurement that is sensitive to changes in the diffuse model. Even so, the spectrum is 

946 moderately pulsar-like with a high-energy cutoff and there is no hint of variability. Both 

947 classiflcation techniques consider this source a pulsar candidate. 

948 • IFGL J2339. 7-0531 (OFGL J2339. 8-0530) - This very hard source (spectral index = 1.99) 

949 lies at high Galactic latitude (/, b = 81?4, — 62?5). While its IFGL five-band spectrum suggests 

950 a blazar, neither classification method was able to classify this source. 



While five of these are c-sources, only two (IFGL J1842.9-0359c and IFGL J1848.1-0145c) 
appear to be questionable, though the possibility of a TeV component for the latter should be 
investigated. The two variable sources seem likely to be AGN. The 7-ray characteristics of the 
majority of the other unassociated BSL sources imply that these sources are bright, steady, and 
have curved spectra with high-energy cutoffs. With the exception of IFGL J0910.4— 5055, all these 
sources were included in the searches for radio pulsations, the same searches which have resulted in 
the discovery of ten new MSPs in unassociated BSL sources. Blind searches for pulsations in the 
7-ray data have also been performed on these sources, with no success. 

One source of interest is IFGL 2017.3+0603, a BSL source that now has detected pulsations 



960 in the LAT data 1 

961 (0.923) in ILAC ( 


Cognard et al. 


2011) 


Abdo et al. 


2010k 


)• 



962 the LAT fiux is due solely to the pulsar, or is a combination of both counterparts. 
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5.7. Comparing with EGRET unassociated sources 



Although the present paper is focused on the Fermi-LAT unassociated sources, some insight 
about these sources may be found from the all-sky survey with EGRET on the Compton Gamma 
Ray Observatory. Using the LAT results, with higher sensitivity and better source locations, as 



967 a reference, we re-examine the sources of two EGRET catalogs: the 3EG catalog (Hartman et al. 



1999) and the EGR catalog ( Casandjian &: Grenier|[2008 ) . The two catalogs were based largely on 



the same data, while they used different models of the diffuse 7 radiation that forms a background 
for all sources. In addition, noting those EGRET unassociated sources that remain unassociated 
in the IFGL catalog offers the opportunity to recognize sources that remain interesting on a time 
scale of decades. 



973 5.7.1. Experience with the EGRET Catalogs 

974 A comparison of the two EGRET catalogs with each other and with the LAT IFGL catalog 

975 yields several observations and tentative conclusions, illustrated by specific examples. Because 

976 more detailed information is available about the 3EG sources, many of these results emphasize that 

977 catalog, although similar considerations likely apply to the EGR analysis. 

978 1. The statistical error contours produced for EGRET underestimated the full uncertainty in 

979 the source localization. The Crab, Vela, Geminga, and PSR J1709— 4429 pulsars, which are 

980 positively identified by 7-ray pulsations in both EGRET and Fermi-LAT data, have positions 



981 outside the formal error contours, even at the 99% level, as noted by Hartman et al. (1999). 

982 For 3EG, but not for EGR, the CTA 1 and LSI -^61° 303 sources, now firmly identified 

983 by LAT, also lie outside the 95% error contours. For this reason, we expand the list of 

984 plausible associations to include all those LAT sources whose position falls within the 99% 



985 error contours. As noted by Hartman et al. (1999), however, the EGRET source localizations 



986 were better at higher Galactic latitudes. Even bright AGN such as 3C279 were typically 

987 found within the error contours. 

988 2. The circular fit to the EGRET 95% error contour was a poor approximation in many cases. 

989 In addition to the 107 matches between IFGL and 3EG reported based on the automated 

990 comparison using this circular fit, there are 21 more IFGL sources found within the 95% error 



contours of the detailed 3EG uncertainty contour maps (Hartman et al. 1999). Further, 19 
IFGL sources are located between the 3EG 95% and 99% contours, making a total of 149 
candidate associations, or 153 including the four bright pulsars identified by timing. The 
14 other IFGL sources that lie just outside the 3EG 99% contours are not included in this 
analysis, although they remain potential association. 



996 3. The adopted diffuse background model is important both close to and far from the Galactic 

997 plane. For example, EGR J0028-I-0457 is confirmed by the LAT as the millisecond pulsar PSR 
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999 
1000 
1001 
1002 

1003 
1004 
1005 
1006 
1007 
1008 
1009 
1010 
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J0030+0451 (lAbdo et al.||2009h|) at a Galactic latitude of -57°. There was no correspond- 



ing source in the 3EG catalog, though a sub-threshold excess was seen (Thompson 2010). 
Conversely, 3EG J1027— 5817, at a Galactic latitude of —1° is confirmed by the LAT as PSR 



J1028— 5819 (Abdo et al.||2009c), while the nearest EGR source is nearly 1° away with a 95% 



error uncertainty of 0.22°. 
4. Variability is an important consideration even for sources not associated with blazars. The 



EGRET upper limit for radio galaxy NGC 1275, derived from Figure 3 of Hartman et al 



(1999), lies nearly an order of magnitude below the LAT detection level (Kataoka et al 



2010D . 3EG J0516+2320 was a bright solar flare from 1991 June ( [Kanbach et~aI]|1993D . The 



IFGL catalog contains no solar flares. 3EG J1837— 0423 (Tavani et al. 1997) was one of the 
brightest sources in the 7-ray sky in 1995 June, at a fliix (E> 100 MeV) of (3.1 it 0.6) x 10~^ 
ph cm~^ s~^. No IFGL source is seen consistent in position with this source, even though 
the LAT would have detected a source nearly 2 orders of magnitude fainter (see Figure 19 of 
Abdo et al. | |2010aD . 



1012 Some of these lessons from the EGRET era have already been applied to the IFGL catalog 

1013 construction but are worth reiterating in any discussion of unassociated sources. 



1014 
1015 
1016 
1017 
1018 
1019 
1020 
1021 

1022 
1023 
1024 

1025 
1026 
1027 
1028 
1029 
1030 



Unlike EGRET, the error contours for IFGL include a systematic component of 10% added 
to the statistical uncertainties. This component was derived, however, from high-latitude 
AGN comparisons. The EGRET experience suggests that the situation will be more difficult 
at lower Galactic latitudes. The low-latitude LAT source associated with LS5039 based on 
periodicity, for example, has a measured position that lies at the 95% uncertainty contour 



even after adding a 20% systematic component (Abdo et al. 2009g). This is an additional 



indication that the LAT positional uncertainties for sources in the Galactic plane are affected 
by the systematics discussed earlier. 

Even without any additional systematic uncertainty, it should be remembered that at least 
5%, or more than 70, of the IFGL sources probably have true counterparts that fall outside 
the 95% contours. 

Just as the EGRET catalog used "C" for potentially confused regions and "em" for possibly 
extended or multiple sources, the IFGL catalog identifies sources with a "c" in the name or a 
numerical flag to indicate possible uncertainties due to the analysis procedure or the diffuse 
model. The LAT has confirmed ~67% of EGRET sources that do not carry either the "C" 
or "em" flags, but has only confirmed ~38% of the flagged sources. This experience with 
EGRET certainly suggests that such flags should be taken seriously. 
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1031 5.7.2. IFGL Unassociated Sources Remaining from the EGRET Era 

1032 Despite the gap of more than a decade between the EGRET and LAT observations, a sizable 

1033 number of the unassociated sources seen by EGRET are associated with IFGL sources that are 

1034 unassociated with known source classes. Whether they are persistent or recurrent, such sources 

1035 offer a high potential for multiwavelength studies. Table [7| includes 43 positional coincidences we 

1036 have found between unassociated sources in the EGRET and IFGL catalogs. As noted above, 

1037 the predominance of 3EG sources results at least partly from the additional information available 

1038 compared to the EGR catalog, which did not include confidence contour maps. 

1039 These unassociated sources are distributed widely across the sky. Only 10 of the 43 have "c" 

1040 designations in IFGL, and all of these lie close to the Galactic plane toward the inner Galaxy, 

1041 where the Galactic diffuse emission is brightest and any deficiencies in the model of the diffuse 

1042 emission would have the greatest effect on properties of the IFGL sources. Although the EGRET 

1043 localization uncertainties are large, the density of IFGL sources away from the Galactic Plane 

1044 is not so large that accidental coincidences are a significant problem. EGR J1642+3940 (IFGL 

1045 J1642. 5+3947) appears to be a special case. It appeared in the EGRET data only after the end 



1046 of the 3EG data set, but it has been seen by the LAT. Casandjian & Grenier (2008) suggest an 



1047 association with blazar 3C 345, although it might also be associated with Mkn 501 (Kataoka et al 



1999). Although it is shown as unassociated in the IFGL catalog, recent LAT analysis also suggests 



one or more blazars contribute to this source dSchinzel et aLjlMo] ) 



6. Conclusions 

As the Fermi mission matures, it is important to take a look at the successes of the early 
mission to help inform and improve the association and follow-up of new sources. The continued 
multiwavelength observations and ongoing statistical association efforts for the IFGL unassociated 
sources have led to associations for 70% of the entire catalog. Since the release of the catalog, 45% 
of all the extragalactic sources expected from the LogA'"-Log5' analysis have been associated. In 
addition, 47% of the expected high-latitude Galactic sources have been associated. Together, this 
gives associations for nearly 82% of all expected extragalactic and nearly 62% of expected high- 
latitude Galactic sources. However, there are associations for only ~ 38% of the expected sources 
of Galactic origin that lie at \h\ < 10°. 

The significant improvement in sensitivity of the Fermi-LAT relative to EGRET, the sub- 
stantially improved positional errors, and the sky-survey viewing plan made possible by the large 
field of view of the LAT have generated an unprecedented data set and allowed the production 
of the deepest-ever catalog of the GeV sky. From that foundation, the astronomical community 
has worked in concert to discover new additions to known 7-ray source classes, as well as adding 
SNRs, PWNe, starburst galaxies, radio galaxies, HMXBs, globular clusters and a treasure trove of 
millisecond pulsars to the list. 
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1067 With that success as a backdrop, there are clearly lessons we have learned from the IFGL 

1068 catalog process and follow-up analyses, as well as from experience with previous missions: 



1069 1. As discussed in Abdo et al. (2010a), it is clear that there is room for improvement in the plane 

1070 of the Galaxy, and especially the ridge. This region contributes numerous questionable sources 

1071 to the catalog. However, if we set that region aside, essentially half of the remaining IFGL 

1072 unassociated sources in other regions now have associations, giving an overall association rate 

1073 of ~'70% for IFGL. In contrast, the 3rd EGRET catalog had an association rate of only 38%. 

1074 2. Follow-up campaigns to associate or identify the LAT-detected sources have been extremely 

1075 successful. A total of 178 new blazar candidate associations have been made, primarily by 

1076 looking for radio candidates within the LAT error ellipses and then re-observing at additional 

1077 frequencies to determine if the source has the characteristic flat spectrum in the radio. Thirty- 

1078 one new Galactic field MSPs have been discovered based on locations provided by the LAT 

1079 for radio pulsations searches. 

1080 3. By using the distribution of detected sources, we can model the IFGL 7-ray sky and predict 

1081 how many of each general source type we expect in the catalog. After taking into account the 

1082 associations made since the release of the IFGL catalog, this analysis indicates that, as the 

1083 mission continues, we might expect to find associations for at least ~200 more AGN (mostly 

1084 at high Galactic latitudes) and ~50 new pulsars (equally divided at high and low latitudes) 

1085 among the unassociated IFGL sources. 

1086 4. We have applied two analysis techniques to infer the likely classification for unassociated 

1087 sources, based solely on their 7-ray properties. The 7-ray properties of sources, while not 

1088 being sufficient on their own to determine source type, can provide important information 

1089 regarding the parent source classes. Using the information provided by the LAT to inform 

1090 the selection of 7-ray sources and wavelengths for follow-up studies can reduce the labor 

1091 intensive nature of such observations and increase the likelihood of finding a viable association 

1092 candidate. A preliminary assessment of this process shows a success rate of - 
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Table 1. Spatial distribution of varioTis source associations from the IFGL and ILAC catalogs 



Source 


Sources at 


Sources at 


Ridge* 


class 


|6| > 10° 


\b\ < 10° 


sources 


Associated 


670 


151 


31 


AGN 


642 


51 


1 


Pulsars 


16 


47 


11 


SNRs/PWNe 


1 


45 


19 


Other 


11 


8 





Unassociated 


373 


257 


88 


Non-c sources 


354 


139 





c-sources 


19 


118 


88 



"^Here, the Galactic ridge is defined as sources with 
|b| <1° and |1| <60°. This value is a subset of the previ- 
ous column of |b| <10° sources. 



Table 2. Expected vs. Observed source distribution 



Source 


Sources at 


Sources at 


Totals 


types 


|6| > 10° 


l&l < 10° 




Total detected 


1043 (71.9%) 


408 (28.1%) 


1451 


Associated 


670 


151 


821 


Unassociated 


373 


257 


630 


Extragalactic 








Total from LogJV-LogS 


972 


88 


1060 (73.1%) 


Associated 


650 


51 


701 


Not Associated in IFGL 


322 


37 


359 


New Associations 


150 


27 


177 


New Classifications 


154 


67 


221 


LogJV-LogS Comparison 


-18 


+57 


+39 


Galactic 








Total from LogJV-Log5 


71 


320 


391 (26.9%) 


Associated 


20 


100 


120 


Not Associated in IFGL 


51 


220 


271 


New Associations 


27 


25 


52 


New Classifications 


31 


103 


134 


LogAf-LogS Comparison 


+7 


-92 


-85 



Note. — 



Results from LogJV-LogS analysis, applied to the IFGL source list. 
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Table 3. Ranking of the training variables for the Classification Tree. 



Variable 


Importance 


Fractional Variability 


21.9% 


HardnesS45 


16.0% 


Hardness23 


15.8% 


Spectral Index 


13.0% 


Hardnessi2 


12.7% 


Hardness34 


11.8% 


Curvature 


8.8% 



Note. — List of the training vari- 
ables for the Classification Tree: each 
variable is ranked according to its rele- 
vance in the discrimination process, as 
computed by the CT algorithm. 
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Table 5. List of the predictor variables for the LR model. 



Variable 


Coefficient 


Standard Error 


p- value 


Intercept 


-22.17 


4.97 


<0.001 


Fractional Variability 


10.61 


1.49 


<0.001 


Spectral Index 


11.30 


2.47 


<0.001 


HardnesS23 


-3.84 


1.27 


0.002 


Hardness34 


8.14 


1.53 


<0.001 


Hardness4B 


3.72 


0.76 


<0.001 


Hardnessi2 






0.242 


glat 






0.333 


glon 






0.144 



Note. — Variables selected for the Logistic Regression analysis 
are listed at top. Those rejected are listed below the line. 



Table 6. Comparison of results for the classification techniques 



Classification 





AGN 


Pulsar 


Unclassified 


CT Totals 


304 


160 


166 


For |6| > 10° 


244 


33 


96 


For fe < 10° 


60 


127 


70 


LR Totals 


368 


122 


140 


For |b| > 10° 


276 


39 


58 


For |b| < 10° 


92 


83 


82 


Combined 


386 


177 


53 


For |6| > 10° 


300 


50 


22 


For 6 < 10° 


86 


127 


31 






LR Class 




CT Class 


AGN 


Pulsar 


Unclassified 


AGN 


269 


2 


33 


Pulsar 


32 


72 


56 


Unclassified 


94 


31 


41 



Note. — Summajries for both high and low-Galactic 
latitude classification results for the Logistic Regres- 
sion (LR) and Classification Tree (CT) techniques, as 
well as for the combined sample of classified sources 
(14 sources with conflicting classification are not in- 
cluded). In addition an inter-comparison of the two 
classification techniques is provided. Italics indicate 
conflicting results, while bold indicates agreement. All 
630 IFGL sources are represented here. 
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Table 8. Candidate VHE counterparts 



IFGL Name 



VHE Source Name 



VHE Association 



Reference 



J0648. 8+1516 


VER J06484 


-152 


unidentified 


J1503.4- 


-5805 


HESS J1503- 


582 


unidentified 


J1614.7- 


-5138c 


HESS J1614- 


-518 


Pismis 22 


J1702.4- 


-4147 


HESS J1702- 


-402 


PWN of PSR J1702 


J1707.9- 


-4110c 


HESS J1708- 


410 


unidentified 


J1839.1- 


-0543 


HESS J1841- 


-055 


multiple sources 


J1837.5- 


-0659c 


HESS J1837- 


-069 


unidentified 


J1844.3- 


-0309 


HESS J1843- 


033 


unidentified 


J1848.1- 


-0145c 


HESS J1848- 


018 


W43 



Ong et al. ( 2010 i 



Renaud et aT ("2008 VTiboIIa et al. 



Xliaronian ct al. (2006ij 

; — i 

Aharonian ct al. ( 2008 
Aharonian ct al. (2006 
Siaronian ct al. ( 2008 i 
Xharonian ct al.i ( |2006 i 



I 2009 1 



Hoppc (200^ 



20081 



Tibolla et al. 



(20091 



Note. — 



Candidate VHE counterparts and their associations. Uncertain associations are in italics. 
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Fig. 1. — Comparison of the IFGL Variability Index versus Curvature Index for the associated 
sources (top panel) and unassociated sources (bottom). A separation between the AGN (crosses) 
and pulsar (circles) populations is evident. However the unassociated sources mainly lie in the 
region where those two populations overlap. 
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Fig. 2. — IFGL sky map with the positions of the unassociated sources marked. Here, the non-c 
unassociated sources are indicated by crosses, the c-sources by circles. 




Fig. 3. — Distribution of IFGL sources types by Galactic latitude. The sources associated with 
AGN (blue line) show a clear deficit at low latitudes, while the same region hosts a large number 
of unassociated sources (yellow line) and identified pulsars (red line). 
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Fig. 4. — Distribution of unassociated sources in the Galactic ridge. Left: Source flux {E > 1 GeV) 
for all IFGL sources as a function of Galactic latitude in three longitude bands. The dashed line 
shows the threshold flux for detectability of a source with a power-law spectrum of photon spectral 
index F = 2.2 (from the IFGL sensitivity map, at |Z| =0). An increase in minimum flux is clearly 
visible for sources near |6| = 0°. Right: Unassociated source counts in 0?25 bins. A sharp peak in 
the number of unassociated sources is visible clustered along the central 0?5 of Galactic latitude. 
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Fig. 5. — Distributions with respect to flux of tlie spectral index (top), curvature index (middle) 
and variability index (bottom) for the IFGL associated and identified sources. It is clear that the 
curvature index is dependent on source flux for both AGN (crosses) and Pulsar (circles) populations. 
The high flux pulsar with a low value for curvature index is the Crab pulsar. 
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Fig. 6. — Latitude profile of tlie IFGL sources and the extragalactic source model profile (dashed 
line) 
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Fig. 7. — The fractional variability vs. hardness ratio difference. Top: the IFGL associated AGN 
(blue crosses) and pulsars (red circles). Bottom: the IFGL non-c unassociated sources (green 
crosses) and the c-sources (purple squares). 
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Fig. 8. — Distribution of the Classification Tree predictor. Vertical lines indicate the value of the 
thresholds we set to identify AGN candidates (Predictor >0.75) and pulsar candidates (Predictor 
<0.6). Left: sources of the IFGL catalog identified as pulsar (red) and AGN (blue). Right: 
Distribution of the predictor for unassociated sources. 
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Fig. 9. — Distribution of the Logistic Regression predictor. Vertical lines indicate the value of the 
thresholds we set to identify pulsar candidates (Predictor <0.62) and AGN candidates (Predictor 
>0.98). Left: sources of the IFGL catalog identified as pulsars (red) and AGNs (blue). Right: for 
IFGL unassociated sources. 
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Fig. 10. — Variability index versus curvature index for IFGL unassociated sources classified as 
AGN (blue crosses) and pulsar candidates (red circles). 




Fig. 11. — The spatial distribution of the combined classification sample, in Galactic coordinates. 
Sources are classified as AGN candidates (blue diamond), pulsar candidates (red circles), unclassi- 
fied (green crosses), or in conflict (black squares). 
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Fig. 12. — Left: Distribution of classified sources binned by Galactic latitude, with AGN in blue, 
pulsars in red, unclassified sources in green and sources with conflicting classification in yellow. The 
dashed line in the total distribution. Right: Distribution of AGN candidate binned by Galactic 
latitude. The orange line is the sum of the IFGL AGN associations plus the sources classified as 
AGN candidates (blue line). The dashed line is the distribution for all IFGL sources. 




Fig. 13. — The IFGL unassociated sources in the central few degrees of the Galaxy can be mostly 
separated into those classified as pulsars (red) and those that have conflicting classifications or were 
unable to be classified (green). The few remaining sources were classified as AGN (blue). 



